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The editing of a combinatorial object is the alteration of some 
of its elements such that the resulting object satisfies a certain 
fixed property. The edit problem for graphs, when the edges are 
added or deleted, was first studied independently by the authors 
and Kezdy [4] and by Alon and Stav [3]. In this paper, a general- 
ization of graph editing is considered for multicolorings of the com- 
plete graph as well as for directed graphs. Specifically, the number 
of edge-recolorings sufficient to be performed on any edge-colored 
complete graph to satisfy a given hereditary property is investi- 
gated. The theory for computing the edit distance is extended using 
random structures and so-called types or colored homomorphisms 
of graphs. 
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1. Introduction 

The combinatorial editing problem is, in general, the problem of finding the smallest number of 
element-changes such that the resulting combinatorial object satisfies a certain fixed property. The 
simplest class of objects for which the editing problem was considered is a set of sequences. In fact, 
the first detailed algorithmic study of editing was motivated by bioinformatics, where sequences over 
finite alphabets are considered and editing corresponds to changes of the elements in the sequence 
depicting the mutations in biomolecules. When the desired property consists of a single sequence, 
studying editing corresponds to investigating the Hamming distance between sequences. The notion 
of graph editing was introduced by the authors and Kezdy [4] and independently by Alon and Stav [3] . 
The question considered was: "How many edges does one need to add or delete in a given graph, such 
that the result belongs to a given class of graphs?" The authors showed in [4], that the answer to this 
question for hereditary classes could be expressed in terms of the so-called binary chromatic number 
(also called the colouring number) of the family. Alon and Stav [3] showed that the largest distance 
from a hereditary property is achieved, asymptotically, by an Erdos-Renyi random graph. 

In this paper, the generalized theory is developed for editing of edge-colored complete graphs and 
digraphs. The main result for edge-colored graphs, Theorem 4, is in terms of two parameters: the so- 
called weak and strong r-ary chromatic numbers. The main result for directed graphs, Theorem 18 is 
in terms of two parameters: the weak and strong directed chromatic numbers. In each case, the results 
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Faculty Professional Development grant. 
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come from more general theorems, Theorems 8 and 23 respectively, which deal with generalizing the 
graph notion of types for the above combinatorial objects. The analysis is based on using a version of 
Szemeredi's regularity lemma, which we state as Theorem 12 (see [5] for a proof of Theorem 12), and 
applying it to an Erdos-Renyi-type random edge-colored graph or random digraph, respectively. Gen- 
eral bounds on the edit distance function are given, as well as some editing algorithms and computing 
methods, all of which result from Theorems 8 and 23. 

The paper is structured as follows. Section 2 deals with the case of multicolorings of the edges of 
complete graphs. Section 3 deals with the case of directed graphs. In each of these sections we provide 
definitions, editing algorithms, examples as well as some general theory on the edit distance function. 
Most proofs are presented in Subsection 2.9 and in Subsection 3.10. 

2. Multicolorings of the complete graph 



2.1. Basic definitions 

An equipartition of a finite set is a partition in which each pair of partite sets differ in size by at most 
one. 

For a complete graph on vertex set V, and a finite set Q, we shall say that a Q-coloring, or more 
specifically, a Q- edge- coloring of this graph is a pair G = (V,c), where c: (£) -> Q. Since it is sufficient 
to let Q = {1, . . . , r} for some integer r, we will refer to an {1, . . . , r}-edge coloring of a complete graph 
as simply an r-graph. For any r-graph G, disjoint vertex sets Vi and Vj and color p, p G {1, . . . , r}, 
the expression E p (Vi) denotes the set of edges colored p with both endpoints in G[Vi] and E p (Vi, Vi) 
denotes the set of edges colored p with one endpoint in Vi and the other in Vj. The density vector of 
Vi is an r-dimensional vector p = (pi, . . . ,p r ), where p p = |_E p (Vi)|/('^') for p = 1, . . . ,r. The density 
vector of the pair (Vi, Vj) is an r-dimensional vector p = (pi, . . . ,p r ), where p p = \E p (V, Vj)\/(\Vi \\Vj\) 
for p = 1, . . . , r. Note that for such density vectors, ^2 p p P = 1. 

In this setting, a graph property is merely a set of r-graphs for some positive integer r > 2. If 
G = (V, c) and G' = (V, c') are r-graphs on n labeled vertices, then 

dist(G, G') 

is the proportion of edges on which the colors differ, i.e., the number of edges on which the colors in 
G and G' differ, divided by Q). We may call this the normalized edit distance between G and G' . 

For any property %, a coloring G, an integer n, we define dist(G, "H), dist(n, H), and dist('H) as 
follows: 

dist(G, U) := min {dist(G, G') : V{G') = V(G), G' G %} , 
dist(n,«) := max{dist(G,"H) : \V{G)\ =n}, 
dist("%) := lim dist(n, %). 

n— >oo 

Note that dist(G, G'), dist(G, H), dist(n, H), dist('H) G [0,1]. 
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The last parameter dist(%) is the limit of the largest proportion of the edges necessary to be 
changed in a coloring of a complete graph bring it to a property T~L; we show the existence of this limit 
later. 

A hereditary property of r-graphs (or, simply, hereditary property, where the context is understood) 
is a set of r-graphs that is closed under vertex-deletion and isomorphisms. Let an r-graph G' be an 
induced coloring of an r-graph G if G' can be obtained from G by vertex-deletion. 

For an r-graph, H, the family Forb(-ff) consists of all r-graphs that have no (induced) copies of H. 
For every hereditary property, H, there is a family, F(7~L), of r-graphs such that T~i = CIhgfch) Forb(if). 
If J 7 is a family of r-graphs, then we use Forb(J 7 ) to denote f] H£ jrForh(H) . 

2.2. The r-ary chromatic numbers 



Definition 1 For a hereditary property T~L = ClneTCH) Forh(H) of r-graphs, a weakly-good tuple 

(oi, . . . , a r ) is an r -tuple of non-negative integers such that for some H £ J-(H), the vertex set V{H) 
can be partitioned into sets Si , . . . , S r such that, for each i £ {1, . . . , r} with ^ 0, the partition can 
be further refined Si = V^i U • • • U Vi >ai such that each Vij £ Si does not induce an edge of color i. The 
weak clique spectrum ofH is the set of all tuples (a%, . . . , a r ) that are NOT weakly-good. The weak 
r-ary chromatic number of TL, X^ k ("H), is the maximum I + 1 such that for some non-negative 
integers a±, . . . , a r with a± + ■ ■ ■ + a r = £, the tuple (a±, . . . , a r ) is in the weak clique spectrum ofH. 

For a hereditary property Tl, a strongly-good tuple (ai,...,a r ) is an r -tuple of non-negative 
integers such that for some H £ J-(7i), the vertex set V{H) can be partitioned into sets Si, ... ,S r 
such that, for each i G {1, . . . , r} with ai ^ 0, the partition can be further refined Si = Vn U • • • U Vi fii 
such that each Vij £ Si has all edges of color i. The strong clique spectrum ofH is the set of all 
tuples (oi, . . . , a r ) that are NOT strongly-good. The strong r-ary chromatic number ofH, xf"{H.), 
is the maximum £ + 1 such that for some non-negative integers a\, . . . , a r with a\ + • • • + a r = I, the 
tuple (ai, . . . , a r ) is in the weak clique spectrum ofH. 

IfH = Forb(if), then we denote x7 k (H) = x7 k (H) and Xri H ) = xt{Ti). 

Remark 2 

• The weak [strong] clique spectrum is a downset in the partially ordered set of r -tuples ordered 
coordinatewise. That is, if (a±, . . . ,a r ) is in the weak [strong] clique spectrum and (a\, . . . ,a' r ) 
has the property that < a ■ < m for i = 1, . . . ,r, then (a\, . . . ,a' r ) is also in that weak [strong] 
clique spectrum. 

• Informally, we can partition V{H) into x7 k (H) pieces in which the absent color in each piece is 
arbitrary, but there is some specification of absent colors for which a xJ k {H) — 1 piece partition 
is not possible. 

• Similarly, we can partition V(H) into x^iH) pieces in which the required color in each piece is 
arbitrary, but there is some specification of required colors for which a xf 1 (H) — 1 piece partition 
is not possible. 

• For any r > 2 and any hereditary property of r-graphs, T-L, x™ k {H) < x*r(T~L)- 
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• In the case of r = 2, notions of strong and weak colorings are identical. Further, ifH = Forb(-ff), 
X2{H) corresponds exactly to the binary chromatic number of H, introduced in [4]. This is also 
called the "colouring number" in related literature such as Bollobds and Thomason [16, 17] . 



2.2.1. Examples illustrating the r-ary chromatic numbers of a hereditary family 

(1) Let r = 3 and % be a family of {1, 2, 3}-colored complete graphs not containing a triangle H± 
with colors 1,1,2 on its edges and not containing a triangle H 2 with colors 2,2,3 on its edges. So, 
JF(H) = {H 1 ,H 2 }. 

First, the weak 3-ary chromatic number: Since r = 3, and J-(^H) contains a triangle, any 3-tuple 
(01,02,03) with 01 + 02 + 03 > 3 must be weakly-good. Indeed, each of H\ and H 2 can be vertex- 
partitioned into three parts such that each part is a single vertex, thus not inducing edges of any colors. 
Thus, it is sufficient to consider the tuples with 01 + 02 + 03 < 2. The tuple (1,0,0) is weakly-good 
since we can partition the vertex set of H 2 in one part not containing edges of color 1. Similarly, 
(0,0,1) is good. By monotonicity, all tuples (01,02,03) with a\ > 1 or 03 > 1 are weakly-good. 

However (0, 1,0) is not weakly-good because both Hi and H 2 contain edges of color 2. But (0, 2, 0) 
is weakly-good because H\ can be vertex-partitioned in two parts not containing edges of color 2. 
Thus, the weak clique spectrum of Ti is {(0, 1, 0), (0, 0, 0)}. For the weak 3-ary chromatic number, 

xJ k W) = 2. 

Second, the strong 3-ary chromatic number. Similar to the above, if 01 + 02 + 03 > 2, then either 
Hi or H 2 can be partitioned into two parts such that one part is an edge of a specified color and the 
other part is a vertex. If ai + 02 + 03 < 1, then neither H\ nor H 2 can be partitioned into a single 
monochromatic clique. Thus, the the strong clique spectrum of % is {(1, 0, 0), (0, 1, 0), (0, 0, 1), (0, 0, 0)} 
and for the strong 3-ary chromatic number, x| (?^) = 2 also. 

(2) Let r = 3 and H be a family of {1, 2, 3}-colored complete graphs not containing a triangle Hi with 
colors 1, 1, 2 on its edges. So, JFfti) = {Hi}. If we follow the previous example, it is easy to see that 
(ai, a 2 , 1) is weekly good for all 01,02 > 0. Moreover, (ai, 02, 0) is weakly good as long as ai + 02 > 2. 
Hence, the weak clique spectrum of H is {(1, 0, 0), (0, 1, 0), (0, 0, 0)} and xJ k {U) = 2. 

For the strong clique spectrum, it is easy to see that (0, 0, 2) is in that spectrum, but if 01+02+03 > 
3, then (01,02,03) is strongly-good. Thus, Xxfti) = 3. 

(3) Let r = 2, which we can consider to be the graph case. As we have observed, we may disregard the 
notions of "weak" and "strong" in our terminology. Let H be a K§ colored with edges colored with 
colors 1 and 2 such that each color class is a 5-cycle. Let H be a family of colorings not containing H, 
i.e., JF{%) = {H}. 

We need only consider 2-tuples (i.e., pairs) (01,02) with ai + 02 < 4. It is relatively easy to see 
that (2, 1), (1,2), (3,0) and (0,3) are good. The pairs (2,0), (0,2) and (1, 1) are not good since H has 
no monochromatic clique on more than 2 vertices, but has a total of 5 vertices. By monotonicity, (1, 0) 
and (0, 1) are also not good. Thus the clique spectrum of T~L is {(2, 0), (1,0), (1, 1), (0, 2), (0, 1), (0, 0)}, 
and X2CH) = 3. 
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2.3. A simple editing algorithm 



Let H be a hereditary property of r-graphs, such that % = ClHeTCH) Forb(il). Further, let i = 
X^ k (%) ~~ 1 an d (01, . . . , a r ) be in the weak clique spectrum with Yll=i °« = ^- 

Partition V into r sets Si,...,S r and further refine the partition such that Si = V^iU- • -UV^ ai , for 
i = 1, . . . , r and then recolor the edges in each by recoloring the edges of color i with some other 
arbitrary color. This new coloring does not contain any H € J-(T~L), otherwise the tuple (a±, . . . ,a r ) 
would be good for some H. 

If the sizes of the Vjj-s differ by at most one; i.e., \n/£\ < \Vij\ < \n/€\, then the number of 
changes provided by this algorithm is at most iO™^)- Thus, 

dist(ft) < lim 1 2 > 



(2) " * x^ k (^) 



2.4. Previous results and new main results 



In [4], the authors and Kezdy provide a general bound for dist(%) in the 2-color case. 

Theorem 3 ([4]) For any hereditary property of graphs, Ti, with binary chromatic number \2 > 2, 

< dist(ft) < 



2(X2-1) " " X2-1" 

Furthermore, if% = Forb(ff) siic/i £/ia£ is self-complementary, then dist("H) = 2( X2 (g)-i) • 
Here, we show a similar result in the general case. 

Theorem 4 Let H be a hereditary property of {1, ... , r}- edge- colorings of complete graphs. Let x™ k = 
Xj k {T~L) > 2 and Xr* = Xr*(^) > 2 &e i/ie u>ea& and strong (respectively) r-ary chromatic numbers of 
Ti. Then, 

1 1 

< dist(7£) < 



r(xf - 1) " " x? k ~ 1 ' 

Furthermore, if % = Forb(-ff) such that all color classes of H are isomorphic, then dist(%) < 
1 

We prove Theorem 4 in Section 2.9. The upper bound is found in the simple editing algorithm, 
but to get the lower bound, we need a more general theory. This is Theorem 8 which is stated in 
Section 2.6. We also prove the result for symmetric colorings in Corollary 10. Theorem 8 gives the 
basic results that deal with computing the edit distance for given hereditary properties. To state these 
results, we need to provide some preliminary material. 



2.5. The edit distance function 
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2.5.1. Preliminary definitions For an r-graph, G = (V, c), and some color p £ {l,...,r}, let 
E p {G) denote the graph on vertex set V corresponding to the edges with color p in c. For a positive 
integer r, recall that a density vector p = (pi, . . . ,p r ) (we also refer to it as a probability vector) is a 
nonnegative real vector with the property that Y^ P =\Pi = !• For an y density vector p = (pi, . . . ,p r ), 
and integer n, we denote 1 

dist n (p,ft) = max{dist(G,ft) : \V(G) \ = n and \E p {G)\ = Pp Q,p= l,...,r}. 

In Theorem 8, we show that the following limit exits, which we call the edit distance function: 

dist(p,"H) = lim dist n (p,7{). 

n— >oo 

Having the edit distance function at our disposal, we may also define dist(H) = max p dist(p, H), 
where the maximum is taken over all density vectors. 

2.5.2. Types of colorings In Section 2.6.1, we define two functions which are described in terms 
of types of colorings, which allow us to compute the edit distance function. In Section 2.6, we shall 
provide an algorithm to do such computation. We define a notion which was called a colored regularity 
graph (CRG) by Alon and Stav [3], but earlier called a type by Bollobas and Thomason [16]. We adopt 
latter terminology. 

Definition 5 An r-type (or just, type, where the context is understood), K, is a pair (JJ,4>), where 
U is a finite set of vertices and <f> : U x U — > 2^ 1 '"' ,r ^ \ 0, such that <f>(x,y) = <fi(y,x) and (p(x,x) ^ 
{1, . . . ,r}, for all x,y € U. Informally, we will view an r-type as a complete graph with a coloring of 
both vertices and edges using nonempty subsets of {1, ... ,r}, where the whole set is a forbidden color 
on the vertices. The sub-r-type of K induced byW^U is the r-type achieved by deleting the vertices 
U — W from K. 

We say that an r-graph H = (V, c) embeds in type K = (U, (f>) if there is a map 7 : V — > U such 
that c({v,v'}) = Co implies cq € 4>(^(v),j(v')). In other words, there is a mapping 7 that brings each 
edge of color cq to a vertex or an edge containing cq in its color set. If H embeds in type K, we write 
H 1 — y K , otherwise we write H 0- K . For every hereditary property T~L, we let K,{T{) be the set of all 
r-types such that none of J-{7i) embeds in that type, i.e., 

1C(H) = {K:K is an r-type and H K,VH e F{U)} . 

We say that an r-graph G' = (V,c) has type K = (U,(j>) if G' embeds into K with mapping 
7 : V — > U and 7 is surjective. 

Fact 6 generalizes the ideas underlying the simple editing algorithm in Section 2.3. 

Fact 6 If K is an r-type, G' is of type K and H does not embed into K, then H % G' . 



1 Formally, the sizes of the partitions of the edge set should be integral, so we can take the floor function for 
the sizes of, say E p for p = 1, . . . , r — 1 and the size of E r is what remains. Since we fix p p for p = 1, . . . , r and 
let n approach infinity, this will make no appreciable difference. 
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2.6. Editing algorithm using types 



Let p = (pi, . . . ,p r ) and w = (w\, . . . , w^) be density vectors; i.e., their entries are nonnegative and 
sum to 1. They play different roles, however. The vector p will represent a vector of densities, p p . That 
is, the graph G has PpQ) edges of color p. The vector w will represent a vector of weights, w±, . . . ,Wk, 
assigned to the vertices of an r-type with vertices u±, ... ,Uk, respectively. 

Let G = (V, c) be an r-graph with edges having densities according to the vector p = (pi, . . . ,p r ), 
and H be a hereditary property. In order to find an upper bound on dist(G, %), it is sufficient to 
change G to an r-graph, G' , such that, for all H € J-(H), H does not embed into the new coloring. In 
particular, if the resulting coloring has type K € )C(7-L), then Q is in H. 

Algorithm 7 Fix a K = (U, (j)) € Ki^H) and bring G to a coloring of type K by edge-recoloring. Let 
U = {ui, . . . , Uk}- Partition the vertices of G randomly into sets V\, . . . , Vk such that the probability of 
a vertex to be in a part V{ is Wi. Consider an edge {x, y} of G, let x € Vi, y € Vj, for i, j 6 {1, . . . , k}. 
If c({x,y}) £ 4>({ui,Uj}), recolor {x,y} with a color from (f>((ui,Uj)). This gives the new r-graph G' 
which, according to Fact 6, does not admit an embedding of any H € J-(H), thus G' € T~L. 

Note that this generalizes the simple algorithm in Section 2.3. In that algorithm, the type had 
restricted colorings only on the vertices (possibly of different colors) but each edge receives the color 

2 {l,..,r} 

2.6.1. Analysis of the editing algorithm Consider Algorithm 7 applied with type K. Let G be 
a graph such that the number of edges of color p are p p for p = 1, ... ,r. The expected number of 
changes is 



E[# changes] = 




Let Mx(p) be a k x k matrix such that the (i, j)-th entry, M^(p)(z, j), is 1 — X^e^u, u ) Pp- Thus, 
if w = (wi , . . . , Wk) , then 



E[# changes] = w T M^(p)w| 
Finally, we define two functions in terms of the matrix M^(p): 

/-. \T /I \ [ min wTM A'(p) w 

/k(p) = (jlj Mjf(p) ^-lj and ^(p) = j s.t. w T l = 1 



The / and g functions can be interpreted as follows: If the vertices of an r-graph, G, are assigned 
randomly to parts corresponding to the vertices of K, then /x(p) an d Qk (p) represent the expectation 
of the proportion of times that the color of an edge does not map the set of colors in a corresponding 
vertex or an edge of K. The function /fc(p) is obtained from the uniform distribution, and gk(p) is 
obtained using the optimal distribution (w±, . . . , Wk) of the proportion of sizes of parts. Although the 
g function provides a better bound for dist(p,%), the linearity of the / function helps prove results 
from dist(p, %). 

2.7. Basic results on r-graphs 

Theorem 8 summarizes some facts about the edit distance function that generalize easily from results 
in both and [6] or [12]. The proof is in Section 2.9.2. Fix a density vector p = (pi, . . . ,p r ). Formally, 
the random r-graph of density p, or random r-graph where the context is clear, is denoted G(n, p). It 
is a random variable that is an {1, . . . , r}-coloring of the edges of a labeled K n in which each edge, e, 
is colored independently such that e receives color p with probability p p . 

Theorem 8 Let % be a hereditary property of r-graphs. Fix an r -dimensional density vector p. Then 
the limit dist(p, H) := linij^oo dist n (p, TV) exists. Moreover, 

1. dist(p,%) = inf^ e/c(w) f K (p) = mf KeK{H) g K (p); 

2. for a fixed e > 0, then with probability approaching 1 as n — > oo, 

dist(p, H)-e< dist(G(n, p), H) < dist(p, U)\ 

3. dist(p,ft) =lim n ^ 00 E[dist(G(n,p),^)]; 

4- dist(p,'H) is continuous over the domain of r- dimensional density vectors and is concave down; 2 

5. dist(p,%) achieves its maximum, dist(%), at some density vector p^ (in fact, denote the set of 
all such vectors p^) and so, 

dist(n) = lim E[dist(G(n,p|/),7{)]; and 

n— >oo 

6. Both p^ and dist('H) exist and p^ is a convex and closed set in [0, l] r . 

Remark 9 Note that p^ typically consists of a single vector, but we abuse notation by denoting the 
set of such vectors as when the vector at which the maximum is obtained is not unique. 

Corollary 10 Let H be a symmetric hereditary property; that is, one that has the property such that 
if the r-tuple (a\, . . . ,a r ) is in the weak clique spectrum ofH, then for any permutation ip of{l, . . . ,r}, 
the r-tuple (a^m, ■ ■ ■ , <W r )) is also in the weak clique spectrum. Then, 

dist(-H) < r" 1 ( ^Tai 
\i=i 

Ln particular, if H = Yoih{H) such that all color classes of H are isomorphic, then dist(%) < 
l 

r{ X ^{H)-l) ■ 

2 A function ip(p) being concave down means for every pair of density vectors pi, P2 and every real number 
t G [0, 1], tpi + (1 — t)p2 is a density vector and VK^Pi + (1 — ^)P2) > i^'(Pi) + (1 — t)i/j(p2). 




S 



Proof of Corollary 10. Consider an arbitrary density vector p = (pi, . . . ,p r ) and without loss of 
generality assume that pi < • • • < p r . Choose a permutation of the cij-s such that a\ > ■ ■ ■ > a r . Let 
K = (U, 4>) be a r-type on £ = Y%=i °« vertices such that 4>(ui,Uj) = {1, . . . , r} if i 7^ j and there are 
exactly a,j vertices u such that cp(u, u) = {1, . . . , r} — {j}. 

The off-diagonal entries of M/^(p) are zero and so it is easy to see that Jk(p) = S[=i a iP%- 
We can use a correlation inequality such as FKG [8] to see that 

fK{p)=r^a %Pl <r\^ (j^a\ (j^p) =r-H-\ 

i=l \i=l J \i=l J 

To finish the proof observe that, in the case of H = Forb(ff), I = Y^\=i °« = Xj k (H) — 1- □ 



2.8. Example: triangles 



Theorem 11 gives some basic results on examples of hereditary properties of r-graphs defined by 
triangles. The proof is in Section 2.9.3. 

Theorem 11 Let r = 3 and consider hereditary properties of r-graphs. 

1. If J- is a family of that consists of a single monochromatic triangle, then dist(Forb(J-")) = 1/2. 

2. If J- is a family that consists of a single triangle with two edges colored 1 and one edge colored 
2, then dist(Forb(J r )) = 1/2. 

3. If F is a family that consists of two monochromatic triangles of different colors, then dist(Forb(J 7 )) = 
1/2. 

4- If J- is a family that consists of all six bi-chromatic triangles, then dist(Forb(J-~)) = 2/3. 
5. If J 7 is a family that consists of a single rainbow triangle, then dist(Forb(J-~)) = 1/3. 



2.9. Proofs 



2.9.1. Proof of Theorem 4 The upper bound for this theorem is proven by the simple editing 
algorithm from Section 2.3. 

For the lower bound, we apply part (1) of Theorem 8, which states that dist(p, Tl) = infxgACCH) Ik(p)- 
Consider an arbitrary K = (U, (ft) £ /C(%), an r-type on k vertices. Let K be an auxiliary graph with 
vertex set U such that u and u' are adjacent in K if and only if 4>{u,u') = {1, . . . ,r}. We observe 
that K has no clique on xf = X?^) vertices, otherwise for some H G F(H), H h-> K. Using Turan's 
theorem, the number of edges of K is at most ^r t-1 ■ ~2- 

Let p = -1. Consider the matrix M/f(p) and observe that every entry is either zero or is a positive 
integer multiple of 1/r. The zero entries correspond exactly to pairs with (j) value equal to {1, ... ,r}. 

Thus, this matrix M^-(p) has at least k 2 — 2 f^fery "If) — enTX i es with value at least 1/r. 
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Therefore, /if (p) = ^2l T M^(p)l is at least r ^_ 1 \ ■ Since K was arbitrary, this gives a lower bound 
for dist(p, H). 



2.9.2. Proof of Theorem 8 

Let /(p) = mf K€K{n) f K (p) and let g(p) = inf KeK{H) g K (p). 
A: Upper bound on dist(p,%). 

Let G be an r-graph with the density of its i-th color class be p p for p = 1, . . . , r. Let K G K.(H). 
Apply the editing algorithm in Section 2.6 to G using K. The analysis of the algorithm in Section 2.6.1 
gives that the expected number of changes is fxip)^) an d so dist n (p,"%) < f(p){^)- 

B: Equality of / and g. 

By the definition of gi<(p), it is easy to see that gxip) < /a'(p) for every density vector p. Therefore, 
g{p) < /(p)- F° r the other direction, we will use K and its optimal weight vector w* = {w\, . . . , w^}, 
where W{ corresponds to V{ G V(K) in order to construct a sequence of CRGs, {K{\ such that 
lim^oo /i^(p) = g K (p)- 

First, choose £ large enough to ensure that Wi£ > 2 for i = 1, . . . , k. Then, for each vertex 
Ui G V(K), create \wi£\ copies of Ui in the following sense: Let v! i and u" be copies of u.- L and Uj, 
respectively, where Uj, uj G V(K). Let (ft be the coloring function of K and cj>' be the coloring function 
of K(. If « 7^ j, then (j)' (u^u") = (p(vi,Vj). If i = j and ?^ 7^ u", then <j)' (u'^u'l) = (j)(vi,Vi). Finally, 

= <l>(Vi,Vi). 

The (i,j)-th block is a L^i^J x L^i^J matrix and each entry of the (i,j)-th block is the same as 
the (i, j)-th entry of M^(p). 

If we denote the (i,j)-th entry of M#(p) by rriij, then 

fo(p) = |^ypl T M^(p)l = (E^j) E^'MJM 

< £ 2 ^£( Wi £-l)\ 9K(P) = J1^9K{P). 

Taking £ — > 00, we see that lim^oo /a' £ (p) < 9k{p)- Consequently, for any K G /C(%), 

/(p) = . inf f R (p) < lim /x f (p) < 9k (p)- 

KeK(H) 

Take the infimum over all A' G K,(H), and we have that /(p) < <?(p)- 
C: Lower bound on dist(p,%) using the random graph. 

We apply Theorem 13, which is given in [5]. Theorem 13 is a corollary of Theorem 12, a relatively 
straightforward generalization to r-graphs and digraphs of a theorem by Alon, Fischer, Krivelevich 
and M. Szegedy [1], which is suitable for induced graphs. 
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In an r-graph, the density vector of a pair of disjoint sets of vertices (Vi, Vj) is simply d(Vi, Vj) := 
(di(Vi, Vj), . . . , d r (Vi, Vj)). So we can state the general version of the regularity lemma. For all defini- 
tions of regularity, see [1] . 

Theorem 12 (Alon, et al. [1]) Fix r > 2. For every m and function £ with £ : N — >■ (0, 1), there 
exist S = Si2(r, m, £) and 5 = 5^2 ( r ' m ' ^) the following property: 

If G is a graph [r-graph, digraph] with n > S vertices then there exist an equipartition A = {Vi : 
1 < i < k} of G and an induced subgraph [induced r-subgraph, induced subdigraph] G' of G, with an 
equipartition A' = {V- : 1 < i < k} of the vertices of G' that satisfy: 

• S > k > m. 

• V- C Vi for all i > 1, and \VJ\ > Sn. 

• In the equipartition A ' , all pairs are £{k) -regular. 

• All but at most£(0)Q of the pairs 1 < i < i! < k are such that \\d(Vi,V^)-d(V- ,V-,)\\oo < £(0). 

We use Theorem 12 in order to prove Theorem 13, which is the result that we need. 

Theorem 13 ([5]) Let G' be an r-graph in hereditary property Ti = f^HeTCH) F° r b(-ff) and p = 
(pi, . . . ,p r ) be a density vector. Then, there exists an r-type K € such that H hA K for all 

H G F(H) and with probability going to 1 as n — »■ oo, dist(G r njP , %) > fxip)^) — o(n 2 ). 

The proof of Theorem 13 from Theorem 12 is straightforward and the details are given in [5]. We 
begin with G distributed according to G(n, p) and typical in the sense that any Szemeredi partition will 
have every pair n _0,4 -regular. Let G' be the graph of smallest distance from G and apply Theorem 12. 
The resulting partition A' describes a type K which must be in JC([H). Furthermore, the number of 
changes required to ensure that G' has partition A is very close to fi<(p) because almost every pair 
in A has the same density as in A' . 

Using part A, we see that for any e > 0, with probability approaching 1 as n — > oo, 

(1) /(p) - e/2 < dist(G(n, p),H) < dist(p, H) < /(p). 



We can now combine A, B and C. Take the limit of (1) as n — > oo, and we obtain that for all 
e > 0, /(p) — e/2 < dist(p,"H) < /(p). Hence, dist(p,7{) = /(p) = g(p)- Moreover, we can replace 
the second term with E[dist(G(n, p), H)] because that random variable is bounded (in [0, 1]) and so 
(1) occurring with high probability implies that the random variable is concentrated around its mean, 
which approaches dist(p,%). This verifies parts (1), (2) and (3) of the theorem. 

D: Continuity of /. 

Because the set of r-types is countable, we can linearly order IC([H) to be K\, K2, .... For every density 
vector p, set m e (p) = min^ /^.(p). 

We want to show that each function is Lipschitz with coefficient 1 with respect to the L 1 metric. 
Let p = (pi, . . . ,p p ) and q = (q±, . . . , q p ) be density vectors and define r-types K p , K n £ {Ki, . . . , K^} 
on & p ,£; q vertices, respectively, such that m^(p) = /a' p (p) and ra^(q) = /^(q). Then, using the 
matrix definition of / and the definition of mi as a minimum of linear functions, 

/jr-p (p) - /jr P (q) < /a' p (p) - /jf, (q) < /k-, (p) - /x q (q) 

G^ 1 ) M ^ (p_q) (^ 1 )- fK ^ p) - fK ^ -(1k 1 ) M ^ (p_q) (^ 1 ) 
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Since each of the entries in matrices Mk p and is between zero and one, and the number of 

entries in these matrices is k^ and fcq, respectively, it is the case that 

|/jc P (p) - /jr„(q)| < Hp - qlli- 

Since {mi}i>i is Lipschitz, Definition 7.22 from Rudin [15] says that the sequence of functions is 
equicontinuous. The sequence is also pointwise bounded above by /ki(p) an d below by 0. By Theorem 
7.25(b) from [15] the sequence {me}e>i has a uniformly convergent subsequence. Since {m^}^>i is an 
equicontinuous, each member is itself continuous. Theorem 7.12 from [15] gives that the aforemen- 
tioned uniformly convergent subsequence has a continuous limit. The monotonicity of {mi}e>i gives 
that the limit of any subsequence is the same as the pointwise limit of the sequence itself, namely 
lim^oo me = ini KeK (n) Ik = dist('H). 

E: Concavity. 

Let pi and P2 be density vectors and t £ [0, 1] be a real number. Observe that tpi + (1 — t)p2 is still 
a density vector and, hence, in the domain. Furthermore, 

/(tpi + (1 - i)p 2 ) = 



> 



This gives concavity. 

Using D and E, we obtain part (4) directly and the fact that gu achieves its maximum follows 
from continuity (and compactness) and Theorem 4.16 from [15]. Let S be the set of density vectors p 
such that dist(p,%) = dist("H). The set S must be convex set, because if dist(pi,"H) = dist(p2,"H) = 
dist(T^), then by continuity and concavity, the line segment that connects pi and p2 must consist of 
vectors in S. The set S must be closed because a corollary to Theorem 4.8 from [15] says that, under 
a continuous mapping, the inverse image of a closed set is closed. Since dist(p, H) is a continuous 
function and S is the inverse image of the closed set, {dist(%)}, then S is closed. This verifies parts 
(4), (5) and (6) of the theorem and concludes the proof. □ 



inf {(/jc(pi) + (1 -()/«•(»)} 



t inf {Mpi)} + (1 - 1) mf {/^(p 2 )} 
\KeK(H) J \KeK(H) 

tf( Pl ) + (1 - i)/(p 2 ). 



2.9.3. Proof of Theorem 11 

(1) In order to destroy all copies of a monochromatic 1-colored triangle in an arbitrary coloring of K n , 
it is sufficient to split the vertex set into two parts and recolor all edges within these parts in color 
2. This requires at most ^Q) changes. To see the lower bound, consider K n with all edges colored 
1. After all editing is done to ensure that color class 1 has no triangles, color class 1 is triangle- free, 
having at most ^ edges. Thus, at least ^ = iQ) + o(n 2 ) edges must have been changed. 

(2) In order to destroy all such triangles, it suffices to equipartition the vertex set into two parts 
and recolor all edges within these parts to color 3. This requires at most 5(2) changes. To see the 
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lower bound, consider K n on vertex set with equipartition V\ U V^. Let all edges between V\ and V2 
be colored 1 and let all edges within parts Vi, i = 1,2 be colored 2. We may assume that the only 
editing operations are recoloring an edge of color 1 into color 3 and recoloring an edge of color 2 
into color 3 because this editing will never create a forbidden triangle. Let c be such a recoloring not 
containing triangles with two edges of color 1 and one edge of color 2. Let G be an auxiliary graph 
corresponding to edges of color 3 in this coloring. The complement of G can not have any triangles 
with vertices in both V\ and V2 . It is easy to prove by induction on n that a graph with satisfying such 
a condition could have at most if 2) edges. Therefore G has at least edges, and this corresponds 
to the number of changes made. 



(3) Assume that J- consists of a triangle with all edges colored 1 and of a triangle with all edges 
colored 2. In order to destroy both of these triangles in an any coloring, as in the previous case, it 
is sufficient to equipartition the vertex set into two parts and recolor all edges within these parts in 
color 3. This requires at most I (2) changes. 

As to the lower bound, fix p = (1/2, 1/2, 0) and consider a 3-type, K € JC(T-L), on k vertices. Each 
of the vertices must have color 3. By Turan's theorem, at least (2) — [_^ 2 /4j = \(k 2 — 2fc)/4] edges 
cannot have color 1 and at least \{k 2 — 2k) /4] edges cannot have color 2. Hence, if we consider the 
off-diagonal entries of M^(p), the sum is at least \\{k 2 — 2k) /A\ + \\{k 2 — 2/c)/4~|. So, for any such 
K, 

~k 2 



f K (p) > 



As a result, inf KGK:{H) f K (p) > 1/2. 



1 



k + 2 



2k 



> 



1 



(4) It is suffices to recolor edges of colors 1 or 2 into color 3. As a result, all forbidden colored 
triangles will be destroyed via at most |Q) changes. In fact, for fixed p = {pi,P2,P3), at most 
(1 — max{pi,p2>P3}) (2) changes suffice. 

To see the lower bound, consider a 3-type K € IC(7i) on k vertices. The vertices must be monochro- 
matic and, in addition, the edges incident to a vertex must share the color of that vertex. Otherwise, 
there would be a bichromatic triangle H with H i-> K. This implies, however, that K must be entirely 
monochromatic. Hence, gx(p) > 1 — max{pi,p2,P3}. 

Note the this determines not only dist("H), but the entire function dist(p,%) = 1 — max{pi,p2>P3}- 

(5) Observe that in order to destroy all rainbow triangles using colors 1, 2 and 3, it is sufficient to edit 
the smallest of these color classes, thus performing at most a mm{pi,p2,P3} proportion of changes. 

For the lower bound, simply observe that no edge in any K £ IC(T~L) can be trichromatic. Oth- 
erwise, that edge, together with any vertex to which it is incident admits a mapping of a rainbow 
triangle. Hence, each entry of Mk{p) is at least min{pi,p2jP3} and so fx{p) > min{pi,P2,P3}- Hence 
dist(p,%) = min{pi,p2,P3} and dist(%) = 1/3. □ 



3. Directed graphs 
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3.1. Basic definitions 



We give a number of definitions that are similar to the case of r-graphs, however, there are some 
important distinctions. 

Definition 14 A simple directed graph or digraph is defined to be a pair (V, E) where V is a 
labeled vertex set, E C (V)? and (V)2 denotes the set V xV — {(v,v) : v € V}. We will also view this 
as a coloring; that is, a digraph is a pair (V, c) where c : (V)2 — > {Oi — > i — ^ a function which 
has the property that, for distinct v,w, 

• c(v, w) = c(w, v) if and only if c(v, w) € {0> — } and 

• c(v,w) =— > if and only if c(w,v) =<—. 

Let A := {Oi — ? **~~ i - -Here we interpret the color c(v,w) = Q) to mean that neither (v,w) nor 
(w, v) are in E, the color c(v, w) = — to mean that both (v, w) and (w, v) are in E and the color 
c(v,w) =— > to mean that (v,w) 6 E and (w,v) E. 

For any digraph G on fixed vertex set {v±, . . . ,v n }, disjoint vertex sets Vj and Vj and color p, 

p € A , the expression E p (Vi) denotes the set of pairs {fj,^} with Vi,v\ £Vi,i< i' and c(fj,^) = p. 
The expression E p (Vi, Vj) denotes the set of pairs {vi,Vj} with Vi G Vi, Vj G Vj and c(vi,Vj) = p. Hence, 
Ei-(Vi, Vj) = E->(Vj, Vi). As it happens, we will be able to assume, as in the proof of Theorem 8, that 
our graphs are random. We will also be able to assume that, among the pairs that have directed edges, 
a is as likely as — K Hence, we will postpone the definition of a density vector for directed graphs. 

1 y 

Definition 15 We say that V C A is a palette if either none or both of "— >" and "-(— " are in V. 
There are 5 possible nontrivial palettes: 

0. Vq = A is the most general case. 

1. "Pcompi = { — ; ^ j - is the case of simple digraphs such that every pair of vertices has at least 
one arc between them. 

2- "Poricn = {0> ^~ > — ^} ^ s the case of oriented graphs; that is, no pair of vertices has two arcs 
between them. 

3. "Pundir = {Oj — } ^ the case of simple, undirected graphs. 
4- ^tourn = , — >} is the case of tournaments. 

The palette is the universe in which the editing takes place. That is, if O is n °t m the palette, 
then no pair (v, w) can be changed to color O m t ne editing process. 

If V is a fixed palette and G = (V, c) and G' = (V, d) are digraphs with colors in V, then dist(G, G') 
is the proportion of edges on which the colors differ; i.e., the number of edges on which the colors 
differ, divided by Q)- 3 We may call this the normalized edit distance between G and G' . For any 
property ~H, a simple digraph G with all edge-colors in palette V, an integer re, we define dist(G, H), 
dist(n, H), and dist(H) similarly to the multicolor case. 

A hereditary property of digraphs with respect to palette V (or, simply, hereditary property, where 
the context is understood) is a set of digraphs with all edge-colors in V that is closed under vertex- 
deletion and isomorphisms. Let a digraph G' be an induced digraph of G if G' can be obtained from 
G by vertex-deletion. For a fixed palette, V and a digraph, H, the family Forb(H) (the palette will 
be understood) consists of all digraphs with edge-colors in V that have no (induced) copies of H. 

3 Here, we can talk about pairs because the color of the pair (v, w) determines the color of the pair (w, v). 
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For every palette V and every hereditary property TL with respect to V, there is a family, J r ('H), of 
digraphs such that H = Ohgtch) Forb(i/). If J 7 is a family of digraphs, then we use Forb(J 7 ) to denote 
n^Forb(^). 

3.2. The directed chromatic numbers 



Definition 16 For a hereditary property T~L = Ohgtch) F° rD (-f^) and a palette V , a weakly-good 

triple (oq, 01,02) is a triple of non-negative integers such that for some H G J-{Ti), the vertex set 
V(H) can be partitioned into sets Sq, Si, S2 such that, for each i € {0, 1, 2} with Oi ^ 0, the partition 
can be further refined Si = Vi,! U • • • U V^ ai and 

• each Vqj does not induce a nonedge (i.e., does not induce an edge of color Q)), 

• each Vi j ensures that the directed edges induced by Vij form an acyclic digraph, and 

• each V%j does not induce a bidirectional edge (i.e., does not induce an edge of color —). 

The weak clique spectrum of % with respect to a palette V is the set of all triples (00,01,02) that 
are NOT weakly-good and such that a® = if Q V, ai = if {— >, <— } C\V = 0, 02 = */— ^ V . The 
weak directed chromatic number, X^^CH), of H with respect to a palette V is the maximum 
£ + 1 such that for some non-negative integers 00,01,02, with 00 + 01 + 02 = i, the triple (00, a±, 02) is 
in the weak clique spectrum ofH. We merely use x wk ' dir (%) for the weak directed chromatic number 
if the palette is understood. 

For a hereditary property % = Ohgfch) F° r b(^) and a palette V , a strongly-good triple (oq, a±, 02) 
is a triple of non-negative integers such that for some H 6 T{T-L), the vertex set V{H) can be par- 
titioned into sets Sq,S\,S2 such that, for each i € {0, 1,2} with Oj 7^ 0, the partition can be further 
refined Si = V^i U • • • U Vi Az and 

• each Vqj induces only nonedges (i.e., all edges are of color Q), 

• each Vij ensures that the directed edges induced by Vij induce a transitive tournament, and 

• each Vij induces only bidirectional edges (i.e., all edges are of color —). 

The strong clique spectrum of H with respect to a palette V is the set of all triples (00,01,02) 
that are NOT strongly-good and such that ao = if Q> V , ai = if {— >, <f— } fl V = 0, 02 = if 
— V . The strong directed chromatic number, Xp dir (%) ; ofTL with respect to a palette V is the 
maximum £ + 1 such that for some non-negative integers 00,01,02, with ao + ai + 02 = I, the triple 
(00,01,02) is in the clique spectrum ofH. We merely use x st ' dir (%) for the strong directed chromatic 
number if the palette is understood. 

Remark 17 

• The weak [strong] clique spectrum with respect to a given palette is again a downset in the partially 
ordered set of r -tuples ordered coordinatewise. That is, if (00,01,02) is in the weak [strong] clique 
spectrum and (0^,0^, a' 2 ) has the property that < a[ < Oj for i = 0, 1, 2, then (a' , a^a^) is also 
in that weak [strong] clique spectrum. 
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• For any palette V and any hereditary property of digraphs, %, Xp ' ir (%) < Xp' 

• If the palette V G {P U ndir, Ptourn}, the weak and strong directed chromatic numbers are equal and 

,1 Hir wk,dir st.dir 

so in those cases, we can use xp = Xp = Xp 

• If the palette is PVdir = {O, —}, then x dir (%) is both the binary chromatic number of hereditary 
property %. 

• If % = Forb(-ff) and the palette is "Ptoum = {■<— , — >}, the case of tournaments, then x dlT (H) is 
the fewest number of transitive subtournaments into which V(H) can be partitioned. 



3.3. A simple editing algorithm 



Let V be a palette and let T~L be a hereditary property of digraphs such that H = ClHeTiH) Forb(.ff') 
and each edge of each H € J^CH) has a color in V. Further, let i = Xp k ' dir (%) ~~ 1 anci (^o, 01,02) 
be in the weak clique spectrum and Y2i=o °* = ^ R ecan that if a color is not in the palette, then its 
corresponding a, value must be set to zero. 

Partition V into 3 sets, So, Si, S2 an d further refine the partition such that Si = V^i U • • • U Vi Ai , 
for i = 0, 1, 2 and then recolor the edges induced by each Vjj as follows: 

• If i = 0, then recolor the edges colored O i n t° some other arbitrary color in the palette. 

• If i = 1, then recolor the edges ^— and —¥ so that there are no directed cycles among those 
directed edges. 

• If i = 2, then recolor the edges colored — into some other arbitrary color in the palette. 

This new coloring does not contain any H G J-(H), otherwise the triple (00,01,02) would be weakly 
good for some H. As in the multicolor case, if the partition into sets Vij is an equipartition, then 

dist('H) < wk,dir/^,N 1 



3.4. Main results 



In Section 2, we have seen a general bound in the r-graph case. Here, we show a similar result in the 
directed case. 

Theorem 18 Let V be a palette and H be a hereditary property of digraphs. Let xp k ' dir = Xp^'^ (H) 
and Xp ,dw = Xp' dn (T~L) be the weak and strong directed chromatic numbers, respectively, of H. Recall 
that ifV e {Pundir, Ptoum}, then Xp r = Xp kAn = Xp dn - Then, 

°- 77 sti u < dist (^) < wk.dir ~ > VP = Vo = {O, <-, -}• 
4(Xp - 1) Xp ~ 1 

L 77 st.dir 77 < di8t(W) < wk j r , ifV = 7> com pl = {-, <-, -+}. 

3(Xp - 1) Xp - 1 

* st.dir n < dist (^) < wk.dir - » VP = ^rien = {O, "►}■ 

3(Xp - 1) Xp - 1 
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3 - 9 , dir U ^ di8t ^) ^ -dF— T' ^ = P -dir = {O, "}• 
^. di S t(H) = - ^ i/7> = Ptourn = {^,^}- 

2(Xp - 1) 

We prove Theorem 18 in Section 3.10. As in the multicolor case, the upper bound is a consequence 
of the simple editing algorithm. The lower bound comes from Theorem 23, stated below, which is 
the digraph version of Theorem 8 and deals with computing the edit distance for given hereditary 
properties of digraphs. In order to do so, we need to investigate the so-called edit distance function, 
which computes the edit distance of a digraph such that nonedges, directed edges and undirected 
edges having a specified density. 



3.5. The edit distance function 



3.5.1. Preliminary definitions For a digraph, G = (V,c) with c : (V)2 —> -A and c having the 
required symmetries as in Definition 14, partition (V)2 as follows: 

• Eq(G) is the set of all unordered pairs {v,w} such that c(v,w) = 0> 

• E<-(G) is the set of all ordered pairs (v,w) such that c(v,w) =<—, 

• E^(G) is the set of all ordered pairs (v,w) such that c(v,w) =— >, 

• £L(G) is the set of all unordered pairs {v,w} such that c(v,w) = — , 

The definition of a density vector in the r-graph case does not translate well to the directed case 
because of the asymmetry that results from directed edges, so we have a new definition. 

Given a palette, V, A directed density vector (p, q) with respect to V (or, simply, density vector 
or probability vector where the context is understood) is a nonnegative real vector with the property 
that p + 2q < 1. Furthermore, 

1. If V = P compl = {-, <-, ->}, then p + 2q = 1. 

2. UV = Poricn = {O, ^> then p = and q < 1/2. 

3. If V = "Pundit- = {0> — }j then q = and p < 1. This is the r-graph case where r = 2 or simply 
the case of undirected graphs. See [3] and [4]. 

4. If V = Vtoum = {<-, -»•}, then p = and 1 - p - 2q = 0, so q = 1/2. 

For any density vector (p, q), and an integer n, we denote 4 

dist (ha) H)-maxidist(GH)- \V(G)\ = n,\E-(G)\ = p®,\E^(G)\ = q®, 1 
dist n ((p,gj, H) - max jdist(G, H) . \ E ^ G) \ = q Q and |jBq(g)| = {l _ p _ 2q) Q j> • 

Observe that there are, in fact, four densities here; two are equal and all sum to one. Thus, we only 
need two parameters. We choose parameter names as above because the case of q = gives the classical 



4 Formally, the sizes of the partitions of the edge set should be integral, so we can take the floor function for 
the sizes of, say Et_, and the size of Eq is what remains. Since we fix p and q and let n approach 
infinity, this will make no appreciable difference. 
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case of undirected graphs, as we see below. Later in the paper, we show that the following limit exits, 
which we call the edit distance function: 

dist((p,g), %) = lim dist n ((p, q), U). 

n— >oo 

Having the edit distance function, we see that dist('H) = max( p? ) dist((p, q), H), where the maxi- 
mum is taken over all density vectors that are valid under the conditions imposed by the palette. 

3.5.2. Types of colorings In Section 3.6.1, we define two functions which are described in terms 
of dir-types, which allow us to compute the edit distance function. Later in the paper, we shall provide 
algorithms for such computing. 

Definition 19 For a palette V , a V -dir-type (or dir-type or type, where the context and the palette 
are understood), K, is a pair {U,(p), where U is a finite set of vertices and (f> : U x U — )• 2^ \ 0, such 
that 

• for distinct x,y and a £ {Q), —}, <f>(x,y) 9 a if and only if (p(y,x) 3 a and 

• for distinct x,y, (p(x,y) 3—> if and only ifcj>(y,x) and 

• (p(x,x) / V. 5 

The sub-dir-type of K induced by W C U is the dir-type achieved by deleting the vertices U — W 
from K . 

We say that a digraph H = (V, c) embeds in type K = (U, (f>) if there is a map 7 : V — > U such 
that for all vertices v ^ v' , 

• ifj(v) ^ 7(V), then c(v,v') G <j>{ r y{v),^{v')), 

• if co £ {0> — } an d c o ^ 4 > { u i u )> then 7~ 1 (u) has no pair with color co, 

• if {<—, — >} (~l (p(u, u) = 0, then 7 _1 (ii) has no directed edge, and 

• if , — n 4>{u,u)\ = 1, then 7 _1 (u) has no directed cycle. 

In other words, there is a mapping 7 that brings each edge of color cq to a vertex or an edge containing 
Co in its color set, except that if a vertex contains exactly one of {•(— ,— >} then the pre-image of that 
vertex can be ordered transitively with respect to the oriented edges. If H embeds in type K , we write 
H i-> K , otherwise we write H 0- K . For every hereditary property H, we let K,(H) be the set of all 
dir-types such that none of ^{T-L) embeds in that type, i.e., 

K(U) = {K:K is a dir-type, H K,\/H £ F{U)} . 

We say that an digraph G' = (V, c) has type K = (U,<fi) if G' embeds into K with mapping 
7 : V — > U and 7 is surjective. 

We have Fact 20, also similar to the r-graph case, which generalizes the ideas underlying the simple 
editing algorithm in Section 3.3. 

Fact 20 If K is a dir-type, G' is of type K and H does not embed into K, then H ^ G' '. 



5 Note that it is possible that , — >} n <j)(x, x)\ = 1. 
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3.6. Editing algorithm using types 



Let w = (wi, . . . ,Wk) be a density vector and let (pq , p<_ , , p_ ) be a density vector. This latter 
vector will represent a vector of densities. The number of ordered pairs (x, y) with color "— " will be 
p_(n)2 and the number of ordered pairs with color "O" will be pq{u)2. The number of ordered pairs 
with color "•<— " is p<_(n)2 and the number of ordered pairs with color "— >" is p_>(n)2- Consequently, 

The vector w will represent a vector of weights, assigned to the vertices of an dir-type with vertices 
ui,...,Uk, respectively. 

Let V C {O, — >•, — } be a palette, H be a hereditary property and G = (V, c) be a digraph in 
P such that the density vector is (pQ,p^,p^,p^). In order to find an upper bound on dist(G, H), it 
is sufficient to change G to a digraph such that, for all H E J^(7i), H does not embed into the new 
coloring. In particular, if the resulting coloring has type K G /C('H), then this coloring is in T~L. 

Algorithm 21 Fix such a K = (U, G K,{T-i) and try to bring G to a coloring of type K by edge- 
recoloring. Let U = {u±, . . . , u^}. Partition the vertices ofG randomly into sets V%, . . . , such that the 
probability of a vertex to be in a part Vi iswi. With an ordering of the vertices of G and vertices x < y, 
consider an edge (x, y) of G, let x 6 Vi, y G Vj , for i,j 6 {1, . . . ,k}. If i ^ j and c(x, y) 4>(ui,Uj), 
recolor (x,y) with a color from (j>(ui,Uj). 

Next, consider the edges in Vi. If <p(ui,Ui) contains exactly one o/{<— ,—>■}, then consider a random 
order of the vertices ofV%, call it a. Let x < y and both inVi. If c(x, y) =<—, then recolor (x, y) if and 
only if a(x) < a(y). Ifc(x,y) =— >, then recolor (x,y) if and only if a(x) > o~{y). Note that this forces 
Vi to have no directed cycles. If <j)(ui,Ui) j$ a for some a € {Q, —}, then recolor any edge with color a 
to a color in <f>(ui,Ui). This concludes the algorithm. 

Algorithm 21 is simply a directed graph version of Algorithm 7. We only needed to address the 
editing of oriented edges. 



3.6.1. Analysis of the editing algorithm Let us first consider a pair (x,y). If c(x,y) G {Oi — }j 
then the probability that the color of (x,y) is unchanged is 

/i w i w jlc(x,y)£<p(u t , Uj )- 
l<i,j<k 

If c{x,y) G {<— ,—>■}, then the probability that the color of (x,y) is unchanged is 

k 

EST^ 2\{^i^} ^ 4>{ u i-, u i)\ ST^ lir i \ i 

WiWjl^^.^ + 2^ w i 2 = ^ WiWj-W^^jrxpiui^j)]. 

l<i<j<k t=l l<i,j<k 

It doesn't matter whether we consider the pair (ui,Uj) or (uj,Ui) in the last term because 
\{<—, — >} H <fi(ui,Uj)\ is invariant whether i < j or i > j. 
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Now, the expected number of changes is 

E[# changes] = ( 2) — P r (( x >2/) is n °t changed) 

x,y£V, x<y 



l<i,j<k 

Let p = p_, g = p ^ p ^ and so 1 — p — 2g = pq. For if = (J7, c), and p € {Oj —}■> the matrix A p is 
such that the (i,j) th entry is 1 if c(ui,Uj) 3 p and zero otherwise. The matrix is a {0, l}-matrix 
with the property that 

( A ^)ij = \{^,^}nc( Ui , Uj )\. 

With J denoting the k x k all-ones matrix, then we define 

M K (p) = 3 - (1 - p - 2q)A Q - pA_ - qA^. 

Consequently, if w = (wi, . . . ,Wk), then E[# changes] = w T M^-(p)w(™) . 

As in the r-graph case, we define two functions in terms of the matrix M^(p): 

• /k(p) = (|i) T M k ( P )(±1) and 

• 9k{p) — min {w T M^-(p)w : w T l = l,w > 0}. 

Note 22 In the directed case, each ordered pair can receive one of 4 directions, but the density vectors 
only have two entries rather than three. This is because the above computation shows that an upper 
bound on editing any digraph is determined not by the pair (p^,p^,) but only by q = +p_>)/2. It 
is straightforward, by the same arguments as in the proof of Theorem 8, to see that the lower bound 
for the maximum edit distance is asymptotically achieved by a random graph in which the probability 
of a forward arc is equal to the probability of a backward arc. 



3.7. Basic results on digraphs 



Theorem 23 is a parallel to Theorem 8 and summarizes some facts about the edit distance function. 
Recall that, depending on the palette, there may be further restrictions on the density vector other 
than the necessary p + 2q < 1. The dimension, r, of the palette, V ', is the number of members of 

{0>->-) -} that V has - 

Theorem 23 Let % be a hereditary property of digraphs and V a palette. Fix a density vector with 
respect to V , p = (p,q). The limit dist(p, H) := lim ra ^ 00 dist n (p, H) exists. Moreover, 

1. dist(p,%) = inf^ e/c(w) f K (p) = mi Ke/c{H) g K (p); 

2. Fix e > 0, then with probability approaching 1 as n — >■ 00, 

dist(p, H)-e< dist(G(n, p), H) < dist(p, H); 
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3. dist(p,W) = lim fW00 E[dist(G(n,p),7£)]; 

4- dist(p,%) is continuous over the domain of density vectors with respect to V and is concave 
down; 

5. dist(p, "H) achieves its maximum, dist('H), at some density vector p^ (in fact, denote the set of 
all such vectors p^) and so, 

diat(7£) = lim E[dist(G(n,p^),H)]; and 

6. Both p^ and dist(%) exist and p^ is a convex and closed set in [0, l] 7 " -1 . 
Note 24 Again, we abuse notation so that p^ can be a single vector or a set. 



3.8. Example: tournaments 



The case of tournaments is relatively straightforward. Because in tournaments, there are no edges 
labeled O or ~~ > there is only one density vector, p = (0,1/2). This means that we only need to 
consider tournaments that are random, that each arc is forward independently with probability 1/2. 
This leads to a rather simple expression for the edit distance: 

Theorem 25 Let H be a nontrivial hereditary property of tournaments and let V = "Ptoum = ) - >}■ 
Then, 

d[stin) = 2 {x fm-iy 

Note that in the case of tournaments, the directed chromatic number of tournament H, Xp t r (H) 
is the smallest number of transitive subtournaments into which H can be partitioned. We prove The- 
orem 25 in Section 3.10.3. 



3.9. Example: triangles 



Theorem 26 gives some basic results on examples of hereditary properties of digraphs defined by 
triangles. The proof is in Section 3.10.4. 

Theorem 26 Consider hereditary properties of digraphs. 

1. If J- is a family that consists of a single directed triangle, then, regardless of the palette, dist(Forb(J r )) = 
1/2. 

2. If J- is a family that consists of a single transitive triangle and V = Vt OVLTn , the palette of 
tournaments, then Forb(J-") is a trivial hereditary property. 

3. If J- is a family of that consists of a single transitive triangle, then, if V is any palette other 
than Ptourn, then dist(Forb(J 7 )) = 1/2. 

4- If J- is a family that consists of both a transitive and a directed triangle, and V is any palette 
other than 'Ptourn, then dist(Forb(J r )) = 1/2. 
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3.10. Proofs 



3.10.1. Proof of Theorem 18 The upper bound for this theorem is proven by the simple editing 
algorithm from Section 3.3. 

Let r = \V\. For the lower bound, we apply part 1 of Theorem 23, which states that dist(p,%) = 
infife/CCH) fi<(p)- Consider an arbitrary K = (V,(j>) € /C("H), a P-dir-type on k vertices. Let K be a 
graph with vertex set V such that v and v' are adjacent in K if and only if <j)(v, v') = V . We observe that 
K has no clique on Xj>' " vertices, otherwise for some H G J-(l-L), H i— >■ K. Using Turan's theorem, the 
number of edges of K is at most x S,di.-~? • Let p = -1. Consider the matrix M/^(p) and observe that 
every entry is either zero or is a positive integer multiple of 1/r. The zero entries correspond exactly 
to pairs with <f> value equal to V. Thus, this matrix M^-(p) has at least k — 2 I S, dir ■■%-)> stX, 

entries with value at least 1/r. Therefore, fi<(p) = rsl r Mx(p)l is at least l/r(xp ir — 1)- Since -RT 
was arbitrary, this gives a lower bound for dist(p,%). 



3.10.2. Proof of Theorem 23 The proof of most of this theorem is identical to that of Theorem 8, 
which is found in Section 2.9.2. The only significant wrinkle is the upper bound. That is, if G is a 
digraph with pyX) edges with color — and (1 — p — 2q) (™) edges with color 0> then, with p = (p, q), 

dismn)/^)< K M n)fK ( P ). 

This follows directly from the analysis of the editing algorithm using types from Section 3.6. 



3.10.3. Proof of Theorem 25 

In this case, p = (0, 1/2). Let H be a hereditary property of tournaments and x dir = Xp t r ourn (^)- in 
any type K on k vertices, the vertices have color "— >" and the edges either have one direction or both. 
By the definition of the directed chromatic number, H i— > K if K has a clique of order x dir such that 
every edge of K has color set {<—,—>}• 

Similar to the argument in Section 3.10.1, we can use Turan's theorem to find a lower bound for 
/a-(p). The bilinear form l T M^(p)l counts ||V(JiC)| + \\E^{K)\ + \\E^(K)\, where E p (K) is the 
set of ordered pairs with color p. Since \E^^y(K)\ + \E<_(K)\ + \E_>(K)\ = k(k — l), Turan's theorem 

gives that \E{<_^y(K)\ < * diT Zi k 2 - Consequently, 



MP) = ^ lTM *(P)! = ^ 



1 1 v dir — 9 

L k + Lu k _ 1) _ * t k z 

2 2 v ; x dir - 1 



2( X dir -l)' 



This concludes the proof of Theorem 25. 
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3.10.4. Proof of Theorem 26 



(1) As to the upper bound, linearly order the vertices so that the number of backward edges (i.e., 
pairs {v{,Vj} such that i < j and c(vi,Vj) =<—) is minimized. A greedy ordering results in at most 
half of such edges being present. Reorient such edges so that they become forward edges, hence 
dist(Forb(J-~)) < 1/2. Note that this corresponds to a K that consists of a single vertex which has 
color — >. 

For the lower bound, consider an arbitrary K € K,{1~L) with vertex set {u\, . . . , Uk} and p = (0, 1/2). 
This means that M^(p) = J - ~A_>.. I.e., (M K (p)) i:j = 1 - \\c(ui,Uj) n 

Here we use an approach due to Sidorenko [18]. See also [12, 19, 20, 21]. For the optimal solution, 
w* to the quadratic program qk{p) = min {w r M^(p)w : w T l = 1, w > 0}, the vector M#-(p)w* is 
a constant vector, equal to g^(p)l. 

Observe that there can be no entry (M-k(p))u = because that means, for the corresponding 
vertex Uj, c{ui,Ui) D {•<— , — >} and a directed triangle maps to such a vertex. Suppose there is some entry 
(Mx(p))ij = 0. This implies that there are a pair of vertices, u% and Uj such that c(ui,Uj) D {<— , — >}. 
We observe that |c(uj, Ui) n {•<— , — >}| = |c(uj, n {•<— , — >}\ = 0, otherwise the directed triangle would 
map to these two vertices of K. Consequently, (M#(p))jj = (JsAk{v))h = 1- Moreover, for every 
i 6 {1, . . . , k} — {i,j}, we have (M^(p))^ + (M/f (p))^ > 1. If not, then without loss of generality, 
we have a triangle {ui, Uj, ui\ in K such that two edges contain {-(— , — >} and the third contains one 
of {<—, —>}. It is easy to see that a directed triangle maps to three such vertices. But then, 



> (1-Wi- Wj) + (M K (p))u ■ Wi + (Mk(p))« • Wj + (M K {p)) ji ■ Wi + {M K (p))jj ■ Wj 
= (1 - Wi - Wj) + Wi + + + Wj = 1. 

Since each sum on the left hand side must be equal to gxip), it must be that gxip) > 1/2. 

Finally, if there is no zero entry in the z-th row of M^(p), then ^^(M^(p))^u^ > 1/2. Thus, in 
all cases, gxip) > 1/2. 

(2) Here we make the easily verified observation that any tournament with at least 4 vertices has a 
transitive subtournament of size 3. So, the hereditary property consists of no tournaments of size 4 or 
more. 

(3) As to the upper bound, equipartition the vertex set arbitrarily and recolor an each edge inside 
either part to have a color other than one in {•<—,—>■}. Hence, dist(Forb(J-")) < 1/2. Note that this 
corresponds to a K that consists of two vertices colored with some nonempty subset of {0> — } an d 
an edge colored V . 

For the lower bound, simply let G be a transitive tournament. After editing G to make G' , there 
can be no triangles from G that remain and so Mantel's theorem gives that 



(4) Here we can use the trivial fact that if H is the hereditary property that forbids both directed and 
transitive triangles and T~L' is the larger hereditary property in (3) which forbids only the transitive 
triangle, then dist(%) > dist(%') = 1/2. But the example above of a type K that consists of two 



J2(M K (p))u ■ we + J2(M K {p)) je ■ w t 
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vertices with none of {<—, — >} in its color set is in fC(7i) in this case as well. Hence, /k(p) = 1/2 and 
so dist(ft) = 1/2. 
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